task tree
STAR: A Foundation Model-driven Framework for Robust Task Planning and Failure Recovery in Robotic Systems
Modern robotic systems, deployed across domains from industrial automation to domestic assistance, face a critical challenge: executing tasks with precision and adaptability in dynamic, unpredictable environments. To address this, we propose STAR (Smart Task Adaptation and Recovery), a novel framework that synergizes Foundation Models (FMs) with dynamically expanding Knowledge Graphs (KGs) to enable resilient task planning and autonomous failure recovery. While FMs offer remarkable generalization and contextual reasoning, their limitations, including computational inefficiency, hallucinations, and output inconsistencies hinder reliable deployment. STAR mitigates these issues by embedding learned knowledge into structured, reusable KGs, which streamline information retrieval, reduce redundant FM computations, and provide precise, scenario-specific insights. The framework leverages FM-driven reasoning to diagnose failures, generate context-aware recovery strategies, and execute corrective actions without human intervention or system restarts. Unlike conventional approaches that rely on rigid protocols, STAR dynamically expands its KG with experiential knowledge, ensuring continuous adaptation to novel scenarios. To evaluate the effectiveness of this approach, we developed a comprehensive dataset that includes various robotic tasks and failure scenarios. Through extensive experimentation, STAR demonstrated an 86% task planning accuracy and 78% recovery success rate, showing significant improvements over baseline methods. The framework's ability to continuously learn from experience while maintaining structured knowledge representation makes it particularly suitable for long-term deployment in real-world applications.
- Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.95)
Prompting Task Trees using Gemini: Methodologies and Insights
Robots are the future of every technology where every advanced technology eventually will be used to make robots which are more efficient. The major challenge today is to train the robots exactly and empathetically using knowledge representation. This paper gives you insights of how we can use unstructured knowledge representation and convert them to meaningful structured representation with the help of prompt engineering which can be eventually used in the robots to make help them understand how human brain can make wonders with the minimal data or objects can providing to them.
- North America > United States > Florida > Hillsborough County > Tampa (0.04)
- Asia > South Korea > Daejeon > Daejeon (0.04)
On the generalization capacity of neural networks during generic multimodal reasoning
Ito, Takuya, Dan, Soham, Rigotti, Mattia, Kozloski, James, Campbell, Murray
The advent of the Transformer has led to the development of large language models (LLM), which appear to demonstrate human-like capabilities. To assess the generality of this class of models and a variety of other base neural network architectures to multimodal domains, we evaluated and compared their capacity for multimodal generalization. We introduce a multimodal question-answer benchmark to evaluate three specific types of out-of-distribution (OOD) generalization performance: distractor generalization (generalization in the presence of distractors), systematic compositional generalization (generalization to new task permutations), and productive compositional generalization (generalization to more complex tasks structures). We found that across model architectures (e.g., RNNs, Transformers, Perceivers, etc.), models with multiple attention layers, or models that leveraged cross-attention mechanisms between input domains, fared better. Our positive results demonstrate that for multimodal distractor and systematic generalization, either cross-modal attention or models with deeper attention layers are key architectural features required to integrate multimodal inputs. On the other hand, neither of these architectural features led to productive generalization, suggesting fundamental limitations of existing architectures for specific types of multimodal generalization. These results demonstrate the strengths and limitations of specific architectural components underlying modern neural models for multimodal reasoning. Finally, we provide Generic COG (gCOG), a configurable benchmark with several multimodal generalization splits, for future studies to explore.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- North America > Dominican Republic (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
Consolidating Trees of Robotic Plans Generated Using Large Language Models to Improve Reliability
The inherent probabilistic nature of Large Language Models (LLMs) introduces an element of unpredictability, raising concerns about potential discrepancies in their output. This paper introduces an innovative approach aims to generate correct and optimal robotic task plans for diverse real-world demands and scenarios. LLMs have been used to generate task plans, but they are unreliable and may contain wrong, questionable, or high-cost steps. The proposed approach uses LLM to generate a number of task plans as trees and amalgamates them into a graph by removing questionable paths. Then an optimal task tree can be retrieved to circumvent questionable and high-cost nodes, thereby improving planning accuracy and execution efficiency. The approach is further improved by incorporating a large knowledge network. Leveraging GPT-4 further, the high-level task plan is converted into a low-level Planning Domain Definition Language (PDDL) plan executable by a robot. Evaluation results highlight the superior accuracy and efficiency of our approach compared to previous methodologies in the field of task planning.
- North America > United States > Florida > Hillsborough County > Tampa (0.14)
- South America > Uruguay > Maldonado > Maldonado (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
Task tree retrieval from FOON using search algorithms
Robots can be very useful to automate tasks and reduce the human effort required. But for the robot to know, how to perform tasks, we need to give it a clear set of steps to follow. It is nearly impossible to provide a robot with instructions for every possible task. Therefore we have a Universal Functional object-oriented network (FOON) which was created and expanded and has a lot of existing recipe information [1]. But certain tasks are complicated for robots to perform and similarly, some tasks are complicated for humans to perform. Therefore weights have been added to functional units to represent the chance of successful execution of the motion by the robot [2]. Given a set of kitchen items and a goal node, using Universal FOON, a robot must be able to determine if the required items are present in the kitchen, and if yes, get the steps to convert the required kitchen items to the goal node. Now through this paper, we use two algorithms (IDS and GBFS) to retrieve a task tree (if possible) for a goal node and a given set of kitchen items. The following would be the different parts of the paper: Section II FOON creation, where we will discuss the different terminologies related to FOON and visualization of FOON. In Section III Methodology we discuss the IDS and GBFS search algorithms and the two different heuristics implemented and used in GBFS. In Section IV Experiment/Discussion, we compare the performance of different algorithms. In the final section V, we specify the references of the papers that have been cited.
- North America > United States > Florida > Hillsborough County > Tampa (0.15)
- Oceania > Australia > Queensland > Brisbane (0.05)
- Asia > South Korea > Daejeon > Daejeon (0.05)
- Asia > China > Shaanxi Province > Xi'an (0.05)
Task Tree Retrieval For Robotic Cooking
This paper is based on developing different algorithms, which generate the task tree planning for the given goal node(recipe). The knowledge representation of the dishes is called FOON. It contains the different objects and their between them with respective to the motion node The graphical representation of FOON is made by noticing the change in the state of an object with respect to the human manipulators. We will explore how the FOON is created for different recipes by the robots. Task planning contains difficulties in exploring unknown problems, as its knowledge is limited to the FOON. To get the task tree planning for a given recipe, the robot will retrieve the information of different functional units from the knowledge retrieval process called FOON. Thus the generated subgraphs will allow the robot to cook the required dish. Thus the robot can able to cook the given recipe by following the sequence of instructions.
- Oceania > Australia > Queensland > Brisbane (0.05)
- North America > United States > Florida > Hillsborough County > Tampa (0.05)
- Asia > South Korea > Daejeon > Daejeon (0.05)
Emergence of Collective Open-Ended Exploration from Decentralized Meta-Reinforcement Learning
Bornemann, Richard, Hamon, Gautier, Nisioti, Eleni, Moulin-Frier, Clément
Recent works have proven that intricate cooperative behaviors can emerge in agents trained using meta reinforcement learning on open ended task distributions using self-play. While the results are impressive, we argue that self-play and other centralized training techniques do not accurately reflect how general collective exploration strategies emerge in the natural world: through decentralized training and over an open-ended distribution of tasks. In this work we therefore investigate the emergence of collective exploration strategies, where several agents meta-learn independent recurrent policies on an open ended distribution of tasks. To this end we introduce a novel environment with an open ended procedurally generated task space which dynamically combines multiple subtasks sampled from five diverse task types to form a vast distribution of task trees. We show that decentralized agents trained in our environment exhibit strong generalization abilities when confronted with novel objects at test time. Additionally, despite never being forced to cooperate during training the agents learn collective exploration strategies which allow them to solve novel tasks never encountered during training. We further find that the agents learned collective exploration strategies extend to an open ended task setting, allowing them to solve task trees of twice the depth compared to the ones seen during training. Our open source code as well as videos of the agents can be found on our companion website.
- Energy > Oil & Gas > Upstream (0.95)
- Leisure & Entertainment > Games (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)
From Cooking Recipes to Robot Task Trees -- Improving Planning Correctness and Task Efficiency by Leveraging LLMs with a Knowledge Network
Task planning for robotic cooking involves generating a sequence of actions for a robot to prepare a meal successfully. This paper introduces a novel task tree generation pipeline producing correct planning and efficient execution for cooking tasks. Our method first uses a large language model (LLM) to retrieve recipe instructions and then utilizes a fine-tuned GPT-3 to convert them into a task tree, capturing sequential and parallel dependencies among subtasks. The pipeline then mitigates the uncertainty and unreliable features of LLM outputs using task tree retrieval. We combine multiple LLM task tree outputs into a graph and perform a task tree retrieval to avoid questionable nodes and high-cost nodes to improve planning correctness and improve execution efficiency. Our evaluation results show its superior performance compared to previous works in task planning accuracy and efficiency.
- North America > United States > Florida > Hillsborough County > Tampa (0.14)
- South America > Uruguay > Maldonado > Maldonado (0.04)
Robot Task Planning Based on Large Language Model Representing Knowledge with Directed Graph Structures
Zhen, Yue, Bi, Sheng, Xing-tong, Lu, Wei-qin, Pan, Hai-peng, Shi, Zi-rui, Chen, Yi-shu, Fang
Traditional robot task planning methods face challenges when dealing with highly unstructured environments and complex tasks. We propose a task planning method that combines human expertise with an LLM and have designed an LLM prompt template, Think_Net_Prompt, with stronger expressive power to represent structured professional knowledge. We further propose a method to progressively decompose tasks and generate a task tree to reduce the planning volume for each task, and we have designed a strategy to decouple robot task planning. By dividing different planning entities and separating the task from the actual machine binding process, the task planning process becomes more flexible. Research results show that our method performs well in handling specified code formats, understanding the relationship between tasks and subtasks, and extracting parameters from text descriptions. However, there are also problems such as limited complexity of task logic handling, ambiguity in the quantity of parts and the precise location of assembly. Improving the precision of task description and cognitive structure can bring certain improvements. https://github.com/NOMIzy/Think_Net_Prompt
- Asia > China (0.04)
- Africa > Cameroon > Gulf of Guinea (0.04)
- Research Report (0.70)
- Workflow (0.46)
Knowledge Retrieval Using Functional Object-Oriented Networks
Robotic agents often perform tasks that transform sets of input objects into output objects through functional motions. This work describes the FOON knowledge representation model for robotic tasks. We define the structure and key components of FOON and describe the process we followed to create our universal FOON dataset. The paper describes various search algorithms and heuristic functions we used to search for objects within the FOON. We performed multiple searches on our universal FOON using these algorithms and discussed the effectiveness of each algorithm.